average rating
- Oceania > Australia > Queensland (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Media > Film (0.68)
- Leisure & Entertainment (0.68)
Spiral of Silence in Large Language Model Agents
Zhong, Mingze, Fang, Meng, Shi, Zijing, Huang, Yuxuan, Zheng, Shunfeng, Du, Yali, Chen, Ling, Wang, Jun
The Spiral of Silence (SoS) theory holds that individuals with minority views often refrain from speaking out for fear of social isolation, enabling majority positions to dominate public discourse. When the 'agents' are large language models (LLMs), however, the classical psychological explanation is not directly applicable, since SoS was developed for human societies. This raises a central question: can SoS-like dynamics nevertheless emerge from purely statistical language generation in LLM collectives? We propose an evaluation framework for examining SoS in LLM agents. Specifically, we consider four controlled conditions that systematically vary the availability of 'History' and 'Persona' signals. Opinion dynamics are assessed using trend tests such as Mann-Kendall and Spearman's rank, along with concentration measures including kurtosis and interquartile range. Experiments across open-source and closed-source models show that history and persona together produce strong majority dominance and replicate SoS patterns; history signals alone induce strong anchoring; and persona signals alone foster diverse but uncorrelated opinions, indicating that without historical anchoring, SoS dynamics cannot emerge. The work bridges computational sociology and responsible AI design, highlighting the need to monitor and mitigate emergent conformity in LLM-agent systems.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Leisure & Entertainment (0.70)
- Media > Film (0.30)
- Oceania > Australia > Queensland (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Media > Film (0.68)
- Leisure & Entertainment (0.68)
Datasets for Navigating Sensitive Topics in Recommendation Systems
Kovacs, Amelia, Chee, Jerry, Kazemian, Kimia, Dean, Sarah
Personalized AI systems, from recommendation systems to chatbots, are a prevalent method for distributing content to users based on their learned preferences. However, there is growing concern about the adverse effects of these systems, including their potential tendency to expose users to sensitive or harmful material, negatively impacting overall well-being. To address this concern quantitatively, it is necessary to create datasets with relevant sensitivity labels for content, enabling researchers to evaluate personalized systems beyond mere engagement metrics. To this end, we introduce two novel datasets that include a taxonomy of sensitivity labels alongside user-content ratings: one that integrates MovieLens rating data with content warnings from the Does the Dog Die? community ratings website, and another that combines fan-fiction interaction data and user-generated warnings from Archive of Our Own.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > Tompkins County > Ithaca (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Research Report (0.64)
- Overview > Growing Problem (0.34)
- Media (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- (2 more...)
Social Influence Distorts Ratings in Online Interfaces
Kontalexi, Marina, Gelastopoulos, Alexandros, Analytis, Pantelis P.
Theoretical work on sequential choice and large-scale experiments in online ranking and voting systems has demonstrated that social influence can have a drastic impact on social and technological systems. Yet, the effect of social influence on online rating systems remains understudied and the few existing contributions suggest that online ratings would self-correct given enough users. Here, we propose a new framework for studying the effect of social influence on online ratings. We start from the assumption that people are influenced linearly by the observed average rating, but postulate that their propensity to be influenced varies. When the weight people assign to the observed average depends only on their own latent rating, the resulting system is linear, but the long-term rating may substantially deviate from the true mean rating. When the weight people put on the observed average depends on both their own latent rating and the observed average rating, the resulting system is non-linear, and may support multiple equilibria, suggesting that ratings might be path-dependent and deviations dramatic. Our results highlight potential limitations in crowdsourced information aggregation and can inform the design of more robust online rating systems.
- Oceania > Australia > New South Wales > Sydney (0.05)
- Europe > Denmark > Southern Denmark (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
A Measure of the System Dependence of Automated Metrics
von Däniken, Pius, Deriu, Jan, Cieliebak, Mark
Automated metrics for Machine Translation have made significant progress, with the goal of replacing expensive and time-consuming human evaluations. These metrics are typically assessed by their correlation with human judgments, which captures the monotonic relationship between human and metric scores. However, we argue that it is equally important to ensure that metrics treat all systems fairly and consistently. In this paper, we introduce a method to evaluate this aspect.
- Asia > Singapore (0.05)
- North America > United States (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
End-to-end Training for Recommendation with Language-based User Profiles
Gao, Zhaolin, Zhou, Joyce, Dai, Yijia, Joachims, Thorsten
Many online platforms maintain user profiles for personalization. Unfortunately, these profiles are typically not interpretable or easily modifiable by the user. To remedy this shortcoming, we explore natural language-based user profiles, as they promise enhanced transparency and scrutability of recommender systems. While existing work has shown that language-based profiles from standard LLMs can be effective, such generalist LLMs are unlikely to be optimal for this task. In this paper, we introduce LangPTune, the first end-to-end learning method for training LLMs to produce language-based user profiles that optimize recommendation effectiveness. Through comprehensive evaluations of LangPTune across various training configurations and benchmarks, we demonstrate that our approach significantly outperforms existing profile-based methods. In addition, it approaches performance levels comparable to state-of-the-art, less transparent recommender systems, providing a robust and interpretable alternative to conventional systems. Finally, we validate the relative interpretability of these language-based user profiles through user studies involving crowdworkers and GPT-4-based evaluations. Implementation of LangPTune can be found at https://github.com/ZhaolinGao/LangPTune.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- (15 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.45)
- Media > Television (1.00)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Performance of Recent Large Language Models for a Low-Resourced Language
Jayakody, Ravindu, Dias, Gihan
Large Language Models (LLMs) have shown significant advances in the past year. In addition to new versions of GPT and Llama, several other LLMs have been introduced recently. Some of these are open models available for download and modification. Although multilingual large language models have been available for some time, their performance on low-resourced languages such as Sinhala has been poor. We evaluated four recent LLMs on their performance directly in the Sinhala language, and by translation to and from English. We also evaluated their fine-tunability with a small amount of fine-tuning data. Claude and GPT 4o perform well out-of-the-box and do significantly better than previous versions. Llama and Mistral perform poorly but show some promise of improvement with fine tuning.
- Asia > Sri Lanka (0.05)
- North America > United States (0.04)
- Asia > Middle East > UAE (0.04)
Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction
Dutta, Senjuti, Chen, Sherol, Mak, Sunny, Ahmad, Amnah, Collins, Katherine, Butryna, Alena, Ramachandran, Deepak, Dvijotham, Krishnamurthy, Pavlick, Ellie, Rajakumar, Ravi
Image generation models are poised to become ubiquitous in a range of applications. These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard, failing to consider the subjectivity of such tasks. To investigate how to quantify subjectivity, and the scale of its impact, we measure how assessments differ among human annotators across different use cases. Simulating the effects of ordinarily latent elements of annotators subjectivity, we contrive a set of motivations (t-shirt graphics, presentation visuals, and phone background images) to contextualize a set of crowdsourcing tasks. Our results show that human evaluations of images vary within individual contexts and across combinations of contexts. Three key factors affecting this subjectivity are image appearance, image alignment with text, and representation of objects mentioned in the text. Our study highlights the importance of taking individual users and contexts into account, both when building and evaluating generative models
- North America > United States > Tennessee > Knox County > Knoxville (0.14)
- North America > United States > California > Santa Clara County > Mountain View (0.05)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Health & Medicine (0.68)
- Information Technology (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Performance rating in chess, tennis, and other contexts
In this note, I introduce Estimated Performance Rating (PR$^e$), a novel system for evaluating player performance in sports and games. PR$^e$ addresses a key limitation of the Tournament Performance Rating (TPR) system, which is undefined for zero or perfect scores in a series of games. PR$^e$ is defined as the rating that solves an optimization problem related to scoring probability, making it applicable for any performance level. The main theorem establishes that the PR$^e$ of a player is equivalent to the TPR whenever the latter is defined. I then apply this system to historically significant win-streaks in association football, tennis, and chess. Beyond sports, PR$^e$ has broad applicability in domains where Elo ratings are used, from college rankings to the evaluation of large language models.
- Europe > Austria > Vienna (0.06)
- Europe > United Kingdom > England > Greater London > London > Wimbledon (0.06)
- South America > Brazil (0.05)
- (19 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Leisure & Entertainment > Games > Chess (1.00)